Skip to content

Filter English FTS query noise#107

Merged
SonAIengine merged 1 commit into
mainfrom
codex/sqlite-fts-query-filter
Jul 2, 2026
Merged

Filter English FTS query noise#107
SonAIengine merged 1 commit into
mainfrom
codex/sqlite-fts-query-filter

Conversation

@SonAIengine

Copy link
Copy Markdown
Contributor

Summary

  • clean English FTS query terms by dropping high-frequency question glue and punctuation before FTS5 OR matching
  • preserve stopword-only queries, Korean/non-ASCII terms, and non-ASCII Latin tokens
  • document the MS MARCO 1M reuse improvement

1M result

On the local persistent MS MARCO 1M SQLite DB:

  • before: build 0.0s, search 70.1s, MRR@10 0.462, Hit@10 30/50
  • after: build 0.0s, search 9.1s, MRR@10 0.479, Hit@10 31/50

Tests

  • uv run --extra dev ruff check src/synaptic/backends/sqlite.py tests/test_backend_sqlite.py
  • uv run --extra dev ruff format --check src/synaptic/backends/sqlite.py tests/test_backend_sqlite.py
  • uv run --extra dev pytest tests/test_backend_sqlite.py tests/test_tier1_benchmarks.py -q
  • PYTHONUNBUFFERED=1 uv run --extra sqlite python examples/ablation/run_tier1_benchmarks.py --only msmarco --subset 50 --corpus-limit 1000000 --use-sqlite-graph --sqlite-db-path tests/benchmark/data/msmarco_1m.db --reuse-sqlite-db --progress-every 100000

@SonAIengine SonAIengine merged commit b1e575e into main Jul 2, 2026
2 checks passed
@SonAIengine SonAIengine deleted the codex/sqlite-fts-query-filter branch July 2, 2026 04:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant